15 research outputs found
Smallest singular value of sparse random matrices
We extend probability estimates on the smallest singular value of random
matrices with independent entries to a class of sparse random matrices. We show
that one can relax a previously used condition of uniform boundedness of the
variances from below. This allows us to consider matrices with null entries or,
more generally, with entries having small variances. Our results do not assume
identical distribution of the entries of a random matrix and help to clarify
the role of the variances of the entries. We also show that it is enough to
require boundedness from above of the -th moment, , of the
corresponding entries.Comment: 25 pages, a condition on one parameter was added in the statement of
Theorem 1.3 and Lemma 6.2, results unchange
PAC-Bayesian Computation
Risk bounds, which are also called generalisation bounds in the statistical learning literature, are important objects of study because they give some information on the expected error that a predictor may incur on randomly chosen data points. In classical statistical learning, the analyses focus on individual hypotheses, and the aim is deriving risk bounds that are valid for the data-dependent hypothesis output by some learning method. Often, however, such risk bounds are valid uniformly over a hypothesis class, which is a consequence of the methods used to derive them, namely the theory of uniform convergence of empirical processes. This is a source of looseness of these classical kinds of bounds which has lead to debates and criticisms, and motivated the search of alternative methods to derive tighter bounds.
The PAC-Bayes analysis focuses on distributions over hypotheses and randomised predictors defined by such distributions. Other prediction schemes can be devised based on a distribution over hypotheses, however, the randomised predictor is a typical starting point. Lifting the analysis to distributions over hypotheses, rather than individual hypotheses, makes available sharp analysis tools, which arguably account for the tightness of PAC-Bayes bounds. Two main uses of PAC-Bayes bounds are (1) risk certification, and (2) cost function derivation. The first consists of evaluating numerical risk certificates for the distributions over hypotheses learned by some method, while the second consists of turning a PAC-Bayes bound into a training objective, to learn a distribution by minimising the bound. This thesis revisits both kinds of uses of PAC-Bayes bounds. We contribute results on certifying the risk of randomised kernel and neural network classifiers, adding evidence to the success of PAC-Bayes bounds at delivering tight certificates. This thesis proposes the name “PAC-Bayesian Computation” as a generic name to encompass the class of methods that learn a distribution over hypotheses by minimising a PAC-Bayes bound (i.e. the second use case described above: cost function derivation), and reports an interesting case of PAC-Bayesian Computation leading to self-certified learning: we develop a learning and certification strategy that uses all the available data to produce a predictor together with a tight risk certificate, as demonstrated with randomised neural network classifiers on two benchmark data sets (MNIST, CIFAR-10)
PAC-Bayes unleashed: generalisation bounds with unbounded losses
We present new PAC-Bayesian generalisation bounds for learning problems with
unbounded loss functions. This extends the relevance and applicability of the
PAC-Bayes learning framework, where most of the existing literature focuses on
supervised learning problems with a bounded loss function (typically assumed to
take values in the interval [0;1]). In order to relax this assumption, we
propose a new notion called HYPE (standing for \emph{HYPothesis-dependent
rangE}), which effectively allows the range of the loss to depend on each
predictor. Based on this new notion we derive a novel PAC-Bayesian
generalisation bound for unbounded loss functions, and we instantiate it on a
linear regression problem. To make our theory usable by the largest audience
possible, we include discussions on actual computation, practicality and
limitations of our assumptions.Comment: 24 page
PAC-Bayes Analysis Beyond the Usual Bounds
We focus on a stochastic learning model where the learner observes a finite
set of training examples and the output of the learning process is a
data-dependent distribution over a space of hypotheses. The learned
data-dependent distribution is then used to make randomized predictions, and
the high-level theme addressed here is guaranteeing the quality of predictions
on examples that were not seen during training, i.e. generalization. In this
setting the unknown quantity of interest is the expected risk of the
data-dependent randomized predictor, for which upper bounds can be derived via
a PAC-Bayes analysis, leading to PAC-Bayes bounds.
Specifically, we present a basic PAC-Bayes inequality for stochastic kernels,
from which one may derive extensions of various known PAC-Bayes bounds as well
as novel bounds. We clarify the role of the requirements of fixed 'data-free'
priors, bounded losses, and i.i.d. data. We highlight that those requirements
were used to upper-bound an exponential moment term, while the basic PAC-Bayes
theorem remains valid without those restrictions. We present three bounds that
illustrate the use of data-dependent priors, including one for the unbounded
square loss.Comment: In NeurIPS 2020. Version 3 is the final published paper. Note that
this paper is an enhanced version of the short paper with the same title that
was presented at the NeurIPS 2019 Workshop on Machine Learning with
Guarantees. Important update: the PAC-Bayes type inequality for unbounded
loss functions (Section 2.3) is ne
Semi-Counterfactual Risk Minimization Via Neural Networks
Counterfactual risk minimization is a framework for offline policy
optimization with logged data which consists of context, action, propensity
score, and reward for each sample point. In this work, we build on this
framework and propose a learning method for settings where the rewards for some
samples are not observed, and so the logged data consists of a subset of
samples with unknown rewards and a subset of samples with known rewards. This
setting arises in many application domains, including advertising and
healthcare. While reward feedback is missing for some samples, it is possible
to leverage the unknown-reward samples in order to minimize the risk, and we
refer to this setting as semi-counterfactual risk minimization. To approach
this kind of learning problem, we derive new upper bounds on the true risk
under the inverse propensity score estimator. We then build upon these bounds
to propose a regularized counterfactual risk minimization method, where the
regularization term is based on the logged unknown-rewards dataset only; hence
it is reward-independent. We also propose another algorithm based on generating
pseudo-rewards for the logged unknown-rewards dataset. Experimental results
with neural networks and benchmark datasets indicate that these algorithms can
leverage the logged unknown-rewards dataset besides the logged known-reward
dataset
Tighter risk certificates for neural networks
This paper presents an empirical study regarding training probabilistic
neural networks using training objectives derived from PAC-Bayes bounds. In the
context of probabilistic neural networks, the output of training is a
probability distribution over network weights. We present two training
objectives, used here for the first time in connection with training neural
networks. These two training objectives are derived from tight PAC-Bayes
bounds. We also re-implement a previously used training objective based on a
classical PAC-Bayes bound, to compare the properties of the predictors learned
using the different training objectives. We compute risk certificates that are
valid on any unseen examples for the learnt predictors. We further experiment
with different types of priors on the weights (both data-free and
data-dependent priors) and neural network architectures. Our experiments on
MNIST and CIFAR-10 show that our training methods produce competitive test set
errors and non-vacuous risk bounds with much tighter values than previous
results in the literature, showing promise not only to guide the learning
algorithm through bounding the risk but also for model selection. These
observations suggest that the methods studied here might be good candidates for
self-certified learning, in the sense of certifying the risk on any unseen data
without the need for data-splitting protocols.Comment: Preprint under revie
PAC-Bayes bounds for stable algorithms with instance-dependent priors
PAC-Bayes bounds have been proposed to get risk estimates based on a training
sample. In this paper the PAC-Bayes approach is combined with stability of the
hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting
is used with a Gaussian prior centered at the expected output. Thus a novelty
of our paper is using priors defined in terms of the data-generating
distribution. Our main result estimates the risk of the randomized algorithm in
terms of the hypothesis stability coefficients. We also provide a new bound for
the SVM classifier, which is compared to other known bounds experimentally.
Ours appears to be the first stability-based bound that evaluates to
non-trivial values.Comment: 16 pages, discussion of theory and experiments in the main body,
detailed proofs and experimental details in the appendice
Progress in Self-Certified Neural Networks
International audienceA learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality with a statistical certificate that is valid on unseen data. Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead not only to accurate predictors, but also to tight risk certificates, bearing promise towards achieving self-certified learning. In this context, learning and certification strategies based on PAC-Bayes bounds are especially attractive due to their ability to leverage all data to learn a posterior and simultaneously certify its risk. In this paper, we assess the progress towards self-certification in probabilistic neural networks learnt by PAC-Bayes inspired objectives. We empirically compare (on 4 classification datasets) classical test set bounds for deterministic predictors and a PAC-Bayes bound for randomised self-certified predictors. We first show that both of these generalisation bounds are not too far from out-of-sample test set errors. We then show that in data starvation regimes, holding out data for the test set bounds adversely affects generalisation performance, while self-certified strategies based on PAC-Bayes bounds do not suffer from this drawback, proving that they might be a suitable choice for the small data regime. We also find that probabilistic neural networks learnt by PAC-Bayes inspired objectives lead to certificates that can be surprisingly competitive with commonly used test set bounds